I— title: Mixed Effects Models Mini-Series. Part III. Detect and embrace temporal and spatial non-independence author: “Carsten F. Dormann” date: “17 December 2014” output: html_document: fig_width: 5 keep_md: yes number_sections: yes toc: yes pdf_document: toc: yes —
This session will look at a different type of non-independence than the previous one. While there data were non-independent by design, in this session they are (or may be not) non-independent due to mechanistic ecological processes (such as dispersal of animals) or statistical artefacts (such as “forgetting” to include an important predictor). Key terms are:
Imagine a data set consisting of repeated measurements of, say, CO2 in the atmosphere (plots should ALWAYS be square, except in the case of time series and maps):
par(las=1) # globally set las to 1!
plot(co2)
We are interested in whether there is a significant trend over time, thus time is our fixed effect. Let’s start with a simple linear model and see where we go. To do so, we first have to convert this time-series object into two vectors, one with the CO2 concentrations and one with the date.
TIME <- as.vector(time(co2))
CO2 <- as.vector(co2)
fm1 <- lm(CO2 ~ TIME)
summary(fm1)
##
## Call:
## lm(formula = CO2 ~ TIME)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.040 -1.948 -0.002 1.911 6.515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.25e+03 2.13e+01 -106 <2e-16 ***
## TIME 1.31e+00 1.07e-02 122 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.62 on 466 degrees of freedom
## Multiple R-squared: 0.969, Adjusted R-squared: 0.969
## F-statistic: 1.48e+04 on 1 and 466 DF, p-value: <2e-16
The TIME effect is clearly important and highly predictive (with an R2 = 0.97) and we can do even better with a polynomial of TIME:
fm2 <- lm(CO2 ~ poly(TIME, 2))
fm3 <- lm(CO2 ~ poly(TIME, 3))
fm4 <- lm(CO2 ~ poly(TIME, 4))
anova(fm1, fm2, fm3, fm4)
## Analysis of Variance Table
##
## Model 1: CO2 ~ TIME
## Model 2: CO2 ~ poly(TIME, 2)
## Model 3: CO2 ~ poly(TIME, 3)
## Model 4: CO2 ~ poly(TIME, 4)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 466 3194
## 2 465 2214 1 980 220.08 < 2e-16 ***
## 3 464 2067 1 148 33.23 1.5e-08 ***
## 4 463 2061 1 6 1.26 0.26
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So a cubic function fits best:
plot(CO2 ~ TIME, type="l", lwd=2)
lines(TIME, predict(fm4), col="red", lwd=2)
From the last session, we have the creepy feeling that this regression is some not correct. We know that data points are not independent, since they are taken in exactly the same place, month after month. But that is not necessarily a problem! We account for this by using TIME as predictor. This may already be enough to accommodate the temporal dependence of data into account. To find out whether indeed our data have a temporal dependence in the response which is not accounted for by the model, we use a diagnostic tool called autocorrelation function, short ACF. The ACF creates a new data set of CO2, which is displaced against the original by 1, 2, 3, … time units. Then the new vector is correlated with the original and the correlation value is plotted against the displacement (called the lag). The result looks like this:
acf(residuals(fm4))
Thus, as we lag the data set by 1, 2, 3, … months, the correlation decreases down to a random value (indicated by the stippled blue lines), only then to increase as negative correlation again to a maximum at lag 6. This means that CO2-concentrations in half a year (and in one year, two years, …) can be extremely well predicted from the current value, despite the fact that our model already accounts for a trend in time!
The plot shows that we have temporal autocorrelation in our model residuals, indicating that they are indeed not independent!
We are now left with two possible ways forward:
Let’s start with the second option, since this is a mixed-effect model series, not a time series analysis workshop.
The key trick we have to do is to tell our model that it should know that data points nearer in time are more correlated than those further apart. Thus, correlation is a function of time-distance. But, you may say, the pattern repeats over and over again, so data points 2 years apart are more correlated that those 3 months apart. Well, yes. But they are so closely correlated because the correlation carries over from one lag to the next. So if I know the correlation in lag 12, I also know it (roughly) in lag 24, 36 and so forth. Thus, we don’t need to model each lag-distance, because it automatically is predicted by that 12 months earlier.
To visualise this, we use the partial autocorrelation function:
pacf(residuals(fm4))
The partial autocorrelation function shows us the lag effect after accounting for what can be predicted from the previous lag effects.
In this case, it is a bit messy, since the autocorrelation doesn’t simply fade away with larger lag, but goes up and down in a damped oscillation. Anyway, this is an example, and we shall now try to modify the lm to account for temporal autocorrelation.
The way to do so is called “Generalised Least Squares”, short GLS. In addition to the standard linear model, it also models the way that data point expectations covary with each other. In this case, we expect a data point of lag 1 to be highly positively correlated with the data, one of lag 2 highly negatively and so forth. If we imagine that each data point is a random variate drawn from some distribution, then we can think of a variance-covariance matrix of the data: each data point has its own column and row. On the diagonal, we have the variances, and on the off-diagonal, the covariance among data points. In a linear model, the off-diagonal entries are 0, i.e. data points are drawn independently of one another (and the diagonal is constant, i.e. all data points have the same variance). In a GLS, we can make the off-diagonal entries a function of the temporal distance between data! Isn’t that cool?
To repeat this in a more mathematical way. A LM is typically written as \[y = X\beta + e \text{ , where }e\text{ is from a normal distribution: }e \sim N(0, \sigma).\] This latter equation means that all error has the same variance and it is independent for each data point (also abbreviated as iid). Under such “normal” circumstances, the estimated model parameters are \(\widehat \beta_{OLS} = (X' X)^{-1} X' y\). This changes, as in a GLS, where our errors are not iid any more: \[e \sim N(0, \Omega) \text{ , with } \Omega = \text{ a full } n \times n \text{ matrix,}\] with elements on the diagonal (resembling the original \(\sigma\)) and off-diagonal. Now, since every data point is represented in a row and column, we can use the distances (in time) between data points in a same-size matrix, \(D\), as a basis of how the off-diagonal cells can be parametrised: \[ \Omega \sim e^{-\varphi D}. \] So the larger the distance between to data points, the lower is the entry on the variance-covariance matrix \(\Omega\). By fine-tuning the range of temporal autocorrelation \(\varphi\), we can accommodate different lags of temporal autocorrelation. (All this applies in the same way to spatial autocorrelation, see below.)
In R, we use the gls-function from package nlme. (Actually, this function is also internally behind the lme function.)
library(nlme)
fgls <- gls(CO2 ~ poly(TIME, 3), correlation=corAR1(form=~TIME))
summary(fgls)
## Generalized least squares fit by REML
## Model: CO2 ~ poly(TIME, 3)
## Data: NULL
## AIC BIC logLik
## 1465 1490 -726.7
##
## Correlation Structure: ARMA(1,0)
## Formula: ~TIME
## Parameter estimate(s):
## Phi1
## 0.9511
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 351.3 3.51 100.05 0.0000
## poly(TIME, 3)1 383.5 61.86 6.20 0.0000
## poly(TIME, 3)2 -36.7 43.37 -0.85 0.3980
## poly(TIME, 3)3 10.5 35.81 0.29 0.7695
##
## Correlation:
## (Intr) p(TIME,3)1 p(TIME,3)2
## poly(TIME, 3)1 0.237
## poly(TIME, 3)2 -0.361 0.397
## poly(TIME, 3)3 0.155 -0.520 0.259
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -18.466 -14.849 -12.871 -10.647 1.373
##
## Residual standard error: 1.185
## Degrees of freedom: 468 total; 464 residual
acf(residuals(fgls))
We see that a correlation of 0.95 was fitted for a lag of 1 (“Parameter estimate Phi1”), so only one lag was accommodated (thus: corAR1). With a look at the ACF of the residuals we can see that it is not nearly good enough.
We can construct much more complicated correlation structures using the corARMA function, but this is only for illustration:
COR3 <- corARMA(form=~TIME, p=3, q=0)
COR3 <- Initialize(COR3, data=data.frame("TIME"=TIME))
flag3 <- gls(CO2 ~ poly(TIME, 3), correlation=COR3) # takes a while
summary(flag3)
## Generalized least squares fit by REML
## Model: CO2 ~ poly(TIME, 3)
## Data: NULL
## AIC BIC logLik
## 1469 1502 -726.7
##
## Correlation Structure: ARMA(3,0)
## Formula: ~TIME
## Parameter estimate(s):
## Phi1 Phi2 Phi3
## 1.8094 -1.3810 0.5404
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 351.5 3.54 99.28 0.0000
## poly(TIME, 3)1 393.2 59.36 6.62 0.0000
## poly(TIME, 3)2 -32.9 41.35 -0.80 0.4264
## poly(TIME, 3)3 9.1 34.02 0.27 0.7903
##
## Correlation:
## (Intr) p(TIME,3)1 p(TIME,3)2
## poly(TIME, 3)1 0.241
## poly(TIME, 3)2 -0.413 0.368
## poly(TIME, 3)3 0.141 -0.543 0.228
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -18.472 -15.121 -13.098 -10.914 1.406
##
## Residual standard error: 1.185
## Degrees of freedom: 468 total; 464 residual
acf(residuals(flag3))
Can we be smarter about how to embrace multi-lag dependence than trial-and-error? Luckily, someone else has thought of an automatic way to fit autoregressive models.
library(forecast)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Loading required package: timeDate
## This is forecast 5.5
##
##
## Attaching package: 'forecast'
##
## The following object is masked from 'package:nlme':
##
## getResponse
fautogls <- auto.arima(CO2, xreg=poly(TIME, 3))
## Warning: Unable to fit final model using maximum likelihood. AIC value
## approximated
summary(fautogls)
## Series: CO2
## ARIMA(4,0,3) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 intercept
## 2.692 -3.635 2.638 -0.952 -1.405 1.265 -0.337 337.049
## s.e. 0.015 0.030 0.030 0.014 0.037 0.045 0.040 0.043
## 1 2 3
## 319.415 31.089 -10.694
## s.e. 0.929 0.932 0.937
##
## sigma^2 estimated as 0.207: log likelihood=-295.8
## AIC=616.8 AICc=617.5 BIC=666.6
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.0007054 0.4533 0.3561 6.182e-05 0.1057 0.3309 -0.08238
acf(residuals(fautogls))
This is indeed a much better look of the residual autocorrelation. There are occasional peaks here and there, but the overall level is dramatically reduced. Let’s plot this model onto the data and see what else can be done.
plot(TIME, CO2, type="l", lwd=3)
lines(TIME, fitted.values(fautogls), col="red", lwd=2, lty=2)
A marvellous fit! The key thing to notice here is that the seasonal pattern of CO2 is actually fitted using the covariance matrix, not using a seasonal predictor. We shouldn’t do that, if we can avoid it, because we may want to interpret this seasonality as an actual ecological process, rather than a statistical nuisance.
So, quickly, here is a way to put seasonality into a GLS:
COS <- cos(2*pi*TIME)
SIN <- sin(2*pi*TIME)
fglsseason <- gls(CO2 ~ poly(TIME, 3) + COS + SIN, correlation=COR3)
summary(fglsseason)
## Generalized least squares fit by REML
## Model: CO2 ~ poly(TIME, 3) + COS + SIN
## Data: NULL
## AIC BIC logLik
## 892 933.3 -436
##
## Correlation Structure: ARMA(3,0)
## Formula: ~TIME
## Parameter estimate(s):
## Phi1 Phi2 Phi3
## 1.6733 -1.3092 0.5955
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 330.0 1.906 173.14 0.0000
## poly(TIME, 3)1 293.3 29.690 9.88 0.0000
## poly(TIME, 3)2 71.1 20.831 3.41 0.0007
## poly(TIME, 3)3 -14.6 17.091 -0.86 0.3926
## COS -0.4 0.080 -4.49 0.0000
## SIN 2.9 0.086 33.83 0.0000
##
## Correlation:
## (Intr) p(TIME,3)1 p(TIME,3)2 p(TIME,3)3 COS
## poly(TIME, 3)1 0.255
## poly(TIME, 3)2 -0.453 0.329
## poly(TIME, 3)3 0.139 -0.550 0.213
## COS -0.135 -0.033 0.030 -0.015
## SIN -0.343 -0.103 0.143 -0.042 0.037
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.188 9.153 12.388 13.893 16.171
##
## Residual standard error: 0.63
## Degrees of freedom: 468 total; 462 residual
AIC(fgls)
## [1] 1465
AIC(fglsseason) # much better fit!
## [1] 892
And now the same for the auto.arima (which requires the predictors to be handed over as a matrix):
fautoglsseason <- auto.arima(CO2, xreg=cbind(poly(TIME,3), COS, SIN))
summary(fautogls)
## Series: CO2
## ARIMA(4,0,3) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1 ma2 ma3 intercept
## 2.692 -3.635 2.638 -0.952 -1.405 1.265 -0.337 337.049
## s.e. 0.015 0.030 0.030 0.014 0.037 0.045 0.040 0.043
## 1 2 3
## 319.415 31.089 -10.694
## s.e. 0.929 0.932 0.937
##
## sigma^2 estimated as 0.207: log likelihood=-295.8
## AIC=616.8 AICc=617.5 BIC=666.6
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 0.0007054 0.4533 0.3561 6.182e-05 0.1057 0.3309 -0.08238
summary(fautoglsseason)
## Series: CO2
## ARIMA(2,0,2) with non-zero mean
##
## Coefficients:
## ar1 ar2 ma1 ma2 intercept 1 2 3
## 0.875 -0.705 0.073 0.429 337.055 319.213 31.363 -10.995
## s.e. 0.049 0.047 0.068 0.048 0.037 0.795 0.795 0.796
## COS SIN
## -0.393 2.747
## s.e. 0.062 0.062
##
## sigma^2 estimated as 0.192: log likelihood=-279.4
## AIC=580.9 AICc=581.5 BIC=626.5
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set -0.0002279 0.4387 0.349 -0.0002289 0.1033 0.3242 0.1085
AIC(fautogls)
## [1] 615.6
AIC(fautoglsseason)
## [1] 580.9
AIC(fglsseason)
## [1] 892
plot(TIME, CO2, type="l", lwd=3)
lines(TIME, fitted.values(fautogls), col="red", lwd=2, lty=2)
lines(TIME, fitted.values(fautoglsseason), col="green", lwd=2, lty=3)
A very common design is to sample units (= subjects, plots) repeatedly, a.k.a. repeated measurements. Here we have to tell the model the structure (i.e. samples within plots over time) as well as attempt to represent the temporal dependency itself. Currently, only lme and mgcv::gamm can handle both a random effect and a correlation structure as in the previous time example.
Let’s take a typical example (actually not so typical, but rather exceptional in the long time series these data constitute). In this case, forest plots were treated in three different ways (control, logging, logging and thinning) and monitored repeatedly over decades. Response is basal area (m2/ha). Additionally, the replicated treatments are arranged in five blocks. First, we make a nice “German-colour scheme” plot of the data.
dats <- read.csv("Data_Angela2.csv")
summary(dats)
## Block Treat year plot ba
## Min. :1.00 C :42 Min. :1983 Min. :110 Min. : 3.88
## 1st Qu.:2.00 L :70 1st Qu.:1987 1st Qu.:206 1st Qu.: 6.08
## Median :3.00 LLTI:49 Median :1995 Median :305 Median : 6.95
## Mean :3.17 Mean :1997 Mean :324 Mean : 6.89
## 3rd Qu.:5.00 3rd Qu.:2008 3rd Qu.:504 3rd Qu.: 7.63
## Max. :5.00 Max. :2012 Max. :512 Max. :10.05
dats$Block <- as.factor(dats$Block)
dats$plot <- as.factor(dats$plot)
dats$year <- dats$year - 1983 # reset first year to 0
attach(dats)
plot(ba ~ year, las=1, type="n")
points(ba ~ year, data=dats[Treat=="C",], pch=16, cex=1.5)
points(ba ~ I(year+0.5), data=dats[Treat=="L",], pch=17, cex=1.5, col="red")
points(ba ~ I(year+1), data=dats[Treat=="LLTI",], pch=18, cex=1.5, col="gold")
matlines(unique(year), t(tapply(ba, list(Treat, year), mean)), lwd=2, col=c("black", "red", "gold"), lty=1)
legend("topleft", legend=c("control", "logging", "logging & thinning"), col=c("black", "red", "gold"), lty=1, lwd=2, bty="n", cex=1.25, pch=16:18)
Next, we try to fit a model for the control treatment only, just as a warm-up exercise. We need to tell the model that plots are nested in Block and that through the years there is a (linear) dependence. For some reason, the grouping structure of the random term and the correlation must be identical. Thus, we fit a correlation for year effect per each plot in each Block (which actually makes sense, too).
library(nlme)
flmeC <- lme(ba ~ year, random=~1|Block/plot, correlation=corLin(form=~year|Block/plot), data=dats[Treat=="C",], control=list(maxIter=5000))
summary(flmeC)
## Linear mixed-effects model fit by REML
## Data: dats[Treat == "C", ]
## AIC BIC logLik
## 62.71 72.84 -25.35
##
## Random effects:
## Formula: ~1 | Block
## (Intercept)
## StdDev: 0.2021
##
## Formula: ~1 | plot %in% Block
## (Intercept) Residual
## StdDev: 0.4374 0.4912
##
## Correlation Structure: Linear spatial correlation
## Formula: ~year | Block/plot
## Parameter estimate(s):
## range
## 12.59
## Fixed effects: ba ~ year
## Value Std.Error DF t-value p-value
## (Intercept) 7.022 0.3231 35 21.729 0.0000
## year 0.016 0.0097 35 1.656 0.1066
## Correlation:
## (Intr)
## year -0.425
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.67294 -0.34034 -0.02126 0.74963 1.90458
##
## Number of Observations: 42
## Number of Groups:
## Block plot %in% Block
## 1 6
We included year as a fixed effect to test, whether over time there was a significant change in basal area (there wasn’t, phew, because this is the control and there shouldn’t be one).
Now we can ramp up the model and do the same thing across the three treatments:
flme <- lme(ba ~ Treat*year, random=~1|Block/plot, correlation=corLin(form=~year|Block/plot), data=dats, control=list(maxIter=5000))
anova(flme)
## numDF denDF F-value p-value
## (Intercept) 1 135 315.36 <.0001
## Treat 2 16 0.29 0.7486
## year 1 135 52.76 <.0001
## Treat:year 2 135 3.73 0.0265
summary(flme)
## Linear mixed-effects model fit by REML
## Data: dats
## AIC BIC logLik
## 292.6 323 -136.3
##
## Random effects:
## Formula: ~1 | Block
## (Intercept)
## StdDev: 0.7377
##
## Formula: ~1 | plot %in% Block
## (Intercept) Residual
## StdDev: 0.815 0.6014
##
## Correlation Structure: Linear spatial correlation
## Formula: ~year | Block/plot
## Parameter estimate(s):
## range
## 12.37
## Fixed effects: ba ~ Treat * year
## Value Std.Error DF t-value p-value
## (Intercept) 7.019 0.8377 135 8.378 0.0000
## TreatL -1.085 0.9682 16 -1.121 0.2790
## TreatLLTI -1.287 0.9991 16 -1.288 0.2161
## year 0.016 0.0118 135 1.366 0.1743
## TreatL:year 0.037 0.0149 135 2.445 0.0158
## TreatLLTI:year 0.039 0.0161 135 2.410 0.0173
## Correlation:
## (Intr) TreatL TrLLTI year TrtL:y
## TreatL -0.865
## TreatLLTI -0.838 0.858
## year -0.200 0.173 0.168
## TreatL:year 0.158 -0.219 -0.132 -0.791
## TreatLLTI:year 0.147 -0.127 -0.228 -0.734 0.580
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.39787 -0.34422 0.08586 0.63714 1.88302
##
## Number of Observations: 161
## Number of Groups:
## Block plot %in% Block
## 5 23
We now see that the three treatments differ in their trajectory over time! They have the same intercept (basal area in year 0 = 1983), as indicated by the non-significant effect of Treat. The basal area increases with time (year effect), but this time effect is different for the three treatments (Treat:year interaction significant). Specifically, for the control there is no trend over time (year in the summary is not significant), but both treatments exhibit a trend (the interaction effects are significant).
Our model now has various parameters fitted, but how can we “simply” visualise the effect of treatment, across years? To do so, we need to compute the expected value for each level of the random effect(s), for the desired values of the fixed effects. In other words, if we want the value for, say, control in 1990, we need to compute the model prediction for each block/plot combination and then average those, to get the population-level prediction. Let us as an example extract the expected value for one specific plot, treatment and year:
newdata <- data.frame("Treat"="C", year=1990, Block="5", plot="504")
predict(flme, newdata=newdata)
## 5/504
## 39.9
## attr(,"label")
## [1] "Predicted values"
Luckily there is a build-in option in the predict-function for lme that aggregates across all random effects if we are only interested in the population-level prediction. If we additionally are interested in the confidence interval of this prediction (and I think we should be), then we need to resort to another package and function.
newdata <- data.frame("Treat"="C", year=1990)
predict(flme, newdata=newdata, level=0)
## [1] 39.1
## attr(,"label")
## [1] "Predicted values"
library(AICcmodavg)
predictSE(flme, newdata=newdata, level=0)
## $fit
## [1] 39.1
##
## $se.fit
## [1] 23.34
The level=0 instructs the model to do this averaging over all random effects for us, and only returns the population-level estimate. Note, however, that we cannot just take random values for Block and plot, but rather use combinations that actually exist!
So now turn this into a nice plot:
attach(dats)
## The following objects are masked from dats (position 4):
##
## ba, Block, plot, Treat, year
plot(ba ~ year, las=1, type="n", xlim=c(0, 35))
points(ba ~ year, data=dats[Treat=="C",], pch=16, cex=1.5)
points(ba ~ I(year+0.5), data=dats[Treat=="L",], pch=17, cex=1.5, col="red")
points(ba ~ I(year+1), data=dats[Treat=="LLTI",], pch=18, cex=1.5, col="gold")
newC <- data.frame("Treat"="C", year=0:35)
predsC <- predictSE(flme, newdata=newC, level=0)
matlines(0:35, cbind(predsC$fit,predsC$fit + 2*predsC$se.fit, predsC$fit - 2*predsC$se.fit), lwd=2, lty=c(1,2,2), col="black")
newL <- data.frame("Treat"="L", year=0:35)
predsL <- predictSE(flme, newdata=newL, level=0)
matlines(0:35, cbind(predsL$fit, predsL$fit + 2*predsL$se.fit, predsL$fit - 2*predsL$se.fit), lwd=2, lty=c(1,2,2), col="red")
newLT <- data.frame("Treat"="LLTI", year=0:35)
predsLT <- predictSE(flme, newdata=newLT, level=0)
matlines(0:35, cbind(predsLT$fit, predsLT$fit + 2*predsLT$se.fit, predsLT$fit - 2*predsLT$se.fit), lwd=2, lty=c(1,2,2), col="gold")
legend("topleft", legend=c("control", "logging", "logging & thinning"), col=c("black", "red", "gold"), lty=1, lwd=2, bty="n", cex=1.25, pch=16:18)
You may wonder, why the 96%-confidence intervals are so parallel to the expected values. The main reason is that they depict only the effect of the fixed effect. Another that I have not extrapolated much beyond the data. There you should see a substantial increase in spread of the CI.
Finally, we should always do the usual model diagnostics: residual plots, alternative model structure, etc. Here is a plot of residuals over fitted:
plot(residuals(flme) ~ fitted.values(flme))
Hm. Is it only me who can see some upwards trend in the right half of the plot? Maybe the trend is not linear through time? I add a quadratic term for year and its interactions, plus a cubic term for year alone (wouldn’t converge otherwise).
flme2 <- lme(ba ~ Treat*(year+I(year^2)) + I(year^3), random=~1|Block/plot, correlation=corExp(form=~year|Block/plot), data=dats, control=list(maxIter=5000))
anova(flme2)
## numDF denDF F-value p-value
## (Intercept) 1 131 327.3 <.0001
## Treat 2 16 0.3 0.7142
## year 1 131 53.4 <.0001
## I(year^2) 1 131 17.1 0.0001
## I(year^3) 1 131 1.8 0.1828
## Treat:year 2 131 4.2 0.0175
## Treat:I(year^2) 2 131 3.4 0.0380
summary(flme2)
## Linear mixed-effects model fit by REML
## Data: dats
## AIC BIC logLik
## 326.6 368.9 -149.3
##
## Random effects:
## Formula: ~1 | Block
## (Intercept)
## StdDev: 0.7075
##
## Formula: ~1 | plot %in% Block
## (Intercept) Residual
## StdDev: 0.7506 0.683
##
## Correlation Structure: Exponential spatial correlation
## Formula: ~year | Block/plot
## Parameter estimate(s):
## range
## 18.08
## Fixed effects: ba ~ Treat * (year + I(year^2)) + I(year^3)
## Value Std.Error DF t-value p-value
## (Intercept) 7.073 0.8198 131 8.627 0.0000
## TreatL -1.456 0.9514 16 -1.530 0.1455
## TreatLLTI -1.531 0.9847 16 -1.555 0.1396
## year -0.014 0.0371 131 -0.370 0.7118
## I(year^2) 0.003 0.0023 131 1.234 0.2196
## I(year^3) 0.000 0.0000 131 -1.339 0.1828
## TreatL:year 0.132 0.0390 131 3.386 0.0009
## TreatLLTI:year 0.111 0.0420 131 2.645 0.0092
## TreatL:I(year^2) -0.003 0.0012 131 -2.560 0.0116
## TreatLLTI:I(year^2) -0.002 0.0013 131 -1.805 0.0734
## Correlation:
## (Intr) TreatL TrLLTI year I(y^2) I(y^3) TrtL:y
## TreatL -0.862
## TreatLLTI -0.833 0.843
## year -0.128 0.107 0.104
## I(year^2) 0.036 -0.026 -0.025 -0.826
## I(year^3) -0.007 0.000 0.000 0.555 -0.906
## TreatL:year 0.118 -0.163 -0.098 -0.657 0.307 0.000
## TreatLLTI:year 0.110 -0.095 -0.170 -0.610 0.285 0.000 0.580
## TreatL:I(year^2) -0.056 0.077 0.047 0.604 -0.334 0.000 -0.919
## TreatLLTI:I(year^2) -0.052 0.045 0.081 0.561 -0.310 0.000 -0.533
## TrLLTI: TL:I(^
## TreatL
## TreatLLTI
## year
## I(year^2)
## I(year^3)
## TreatL:year
## TreatLLTI:year
## TreatL:I(year^2) -0.533
## TreatLLTI:I(year^2) -0.919 0.580
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.6073100 -0.3892879 0.0005888 0.4886959 2.0885852
##
## Number of Observations: 161
## Number of Groups:
## Block plot %in% Block
## 5 23
Ah! So there is evidence for a non-linear trend (in the summary: TreatL:poly(year, 2)2 is significant). What do the residuals for this model look like?
plot(residuals(flme2)~ fitted.values(flme2))
Better! Now we want to see the plot with the new model as well:
plot(ba ~ year, las=1, type="n", xlim=c(0, 35))
points(ba ~ year, data=dats[Treat=="C",], pch=16, cex=1.5)
points(ba ~ I(year+0.5), data=dats[Treat=="L",], pch=17, cex=1.5, col="red")
points(ba ~ I(year+1), data=dats[Treat=="LLTI",], pch=18, cex=1.5, col="gold")
newC <- data.frame("Treat"="C", year=0:35)
predsC <- predictSE(flme2, newdata=newC, level=0)
matlines(0:35, cbind(predsC$fit,predsC$fit + 2*predsC$se.fit, predsC$fit - 2*predsC$se.fit), lwd=2, lty=c(1,2,2), col="black")
newL <- data.frame("Treat"="L", year=0:35)
predsL <- predictSE(flme2, newdata=newL, level=0)
matlines(0:35, cbind(predsL$fit, predsL$fit + 2*predsL$se.fit, predsL$fit - 2*predsL$se.fit), lwd=2, lty=c(1,2,2), col="red")
newLT <- data.frame("Treat"="LLTI", year=0:35)
predsLT <- predictSE(flme2, newdata=newLT, level=0)
matlines(0:35, cbind(predsLT$fit, predsLT$fit + 2*predsLT$se.fit, predsLT$fit - 2*predsLT$se.fit), lwd=2, lty=c(1,2,2), col="gold")
legend("topleft", legend=c("control", "logging", "logging & thinning"), col=c("black", "red", "gold"), lty=1, lwd=2, bty="n", cex=1.25, pch=16:18)
We can also fit a spline for each treatment, using mgcv:gamm. The arguments in the spline call are necessary to avoid error messages: k=3 restricts the flexibility of the spline to the equivalent of a cubic polynomial, while bs="ts" employs shrinkage to make the splines as straight as possible (this is a very layman’s explanation of what shrinkage is). I also changed the correlation structure to corExp, which gives (here) a slightly better fit. And, finally, I had to increase the number of permissible iterations to allow for convergence:
library(mgcv)
## This is mgcv 1.8-2. For overview type 'help("mgcv-package")'.
fgamm <- gamm(ba ~ s(year, by=Treat, k=3, bs="ts"), random=list("Block"=~1, "plot"=~1), correlation=corExp(form=~year|Block/plot), data=dats, control=list(maxIter=500))
summary(fgamm$lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: strip.offset(mf)
## AIC BIC logLik
## 263 287.6 -123.5
##
## Random effects:
## Formula: ~Xr - 1 | g
## Structure: pdIdnot
## Xr1 Xr2
## StdDev: 0.7772 0.7772
##
## Formula: ~Xr.0 - 1 | g.0 %in% g
## Structure: pdIdnot
## Xr.01 Xr.02
## StdDev: 13.89 13.89
##
## Formula: ~Xr.1 - 1 | g.1 %in% g.0 %in% g
## Structure: pdIdnot
## Xr.11 Xr.12
## StdDev: 8.22 8.22
##
## Formula: ~1 | Block %in% g.1 %in% g.0 %in% g
## (Intercept)
## StdDev: 0.4387
##
## Formula: ~1 | plot %in% Block %in% g.1 %in% g.0 %in% g
## (Intercept) Residual
## StdDev: 0.767 0.6619
##
## Correlation Structure: Exponential spatial correlation
## Formula: ~year | g/g.0/g.1/Block/plot
## Parameter estimate(s):
## range
## 16.45
## Fixed effects: y ~ X - 1
## Value Std.Error DF t-value p-value
## X 6.786 0.2762 138 24.57 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.77405 -0.38066 0.02686 0.46460 2.05775
##
## Number of Observations: 161
## Number of Groups:
## g
## 1
## g.0 %in% g
## 1
## g.1 %in% g.0 %in% g
## 1
## Block %in% g.1 %in% g.0 %in% g
## 5
## plot %in% Block %in% g.1 %in% g.0 %in% g
## 23
The interpretation is a bit more awkward, since we now have a GAM-part of the object, and an LME-part. We can use the LME-part of the model to investigate how much variance is attributed to different hierarchical levels. And, citing from the gamm help page (see there under “Value: lme”“),”Note that the model formulae and grouping structures may appear to be rather bizarre, because of the manner in which the GAMM is split up and the calls to lme and gammPQL are constructed." This means that we may not actually be completely sure, whether the coding of the random effects is correct. (You may want to try, e.g. "Block"=~1|plot to see that this gives different estimates, especially for the range of spatial autocorrelation! I used this as a guidance that it would not be the correct way to communicate the random effect structure to gamm.)
First of all, the AIC is (substantially) lower for the GAMM than for the LME. (I actually had Treat as an additional fixed effect in the GAMM, but that model was slightly worse and so I deleted it.) Secondly, both range and Block/plot random effects are estimated very similarly. And finally, year:Treat is also indicating a different slope for the two treatments than for the control.
summary(fgamm$gam)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## ba ~ s(year, by = Treat, k = 3, bs = "ts")
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.786 0.276 24.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(year):TreatC 0.388 2 0.31 0.21
## s(year):TreatL 1.874 2 25.02 1.0e-10 ***
## s(year):TreatLLTI 1.625 2 14.69 2.2e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.19
## Scale est. = 0.43812 n = 161
The GAM-part of the model tells us that the “spline” for controls over time is horizontal (has virtually no estimated degree of freedom, edf), while those for the two treatments L and LLTI have a unimodal shape (2 edf). The control-trend is not significant, but those for the two logging treatments are.
Let’s plot also this model. Notice that we only use the GAM-part of the model, which has an se.fit option. Here the fixed-only structure of the predictions is more explicit than in the previous LMEs.
plot(ba ~ year, las=1, type="n", xlim=c(0, 35))
points(ba ~ year, data=dats[Treat=="C",], pch=16, cex=1.5)
points(ba ~ I(year+0.5), data=dats[Treat=="L",], pch=17, cex=1.5, col="red")
points(ba ~ I(year+1), data=dats[Treat=="LLTI",], pch=18, cex=1.5, col="gold")
newC <- data.frame("Treat"="C", year=0:35)
predsC <- predict(fgamm$gam, newdata=newC, se.fit=T)
matlines(0:35, cbind(predsC$fit,predsC$fit + 2*predsC$se.fit, predsC$fit - 2*predsC$se.fit), lwd=2, lty=c(1,2,2), col="black")
newL <- data.frame("Treat"="L", year=0:35)
predsL <- predict(fgamm$gam, newdata=newL, se.fit=T)
matlines(0:35, cbind(predsL$fit, predsL$fit + 2*predsL$se.fit, predsL$fit - 2*predsL$se.fit), lwd=2, lty=c(1,2,2), col="red")
newLT <- data.frame("Treat"="LLTI", year=0:35)
predsLT <- predict(fgamm$gam, newdata=newLT, se.fit=T)
matlines(0:35, cbind(predsLT$fit, predsLT$fit + 2*predsLT$se.fit, predsLT$fit - 2*predsLT$se.fit), lwd=2, lty=c(1,2,2), col="gold")
legend("topleft", legend=c("control", "logging", "logging & thinning"), col=c("black", "red", "gold"), lty=1, lwd=2, bty="n", cex=1.25, pch=16:18)
detach(dats)
A typical setting, in which non-independence arises, is the analysis of spatial data. Here the “first law of geography” holds, that points near each other are more similar than those further away. This is called spatial autocorrelation, and it is the same thing as we have seen in the time-series analysis above, just now in two dimensions.
Note that it is very important to tell between spatial autocorrelation in raw data and in residuals! The former may simply be a consequence of the fact that our environment is spatially autocorrelated, the second is a statistical problem. If our data are only spatially autocorrelated because the environment is, then this is also called “spatial dependence”. Say we are interested in the distribution of a species, say pine marten Martes martes in Europe. Then we would expect that forests are a good predictor, and forests are distributed in a clumped and hence spatially autocorrelated way. As a result, the raw data of pine marten distribution are also spatially autocorrelated, due to spatial dependence on forests. However, we could hope that accounting for forest cover, the model residuals are not spatially autocorrelated any more.
We will see that they still are, possibly because pine martens can move and thus are more likely to be found even in unsuitable sites near to suitable ones. This may also cause spatial autocorrelation (in the residuals).
load("martes.Rdata")
plot(NOFORIGIN ~ EOFORIGIN, data=martes, pch=15, col=ifelse(martes_martes==0, "grey80", "black"), cex=0.7, main="Pine marten distribution in Europe")
round(cor(martes), 3) # no problematic collinearity
## EOFORIGIN NOFORIGIN GDD PRE_YEAR WOOD martes_martes
## EOFORIGIN 1.000 0.210 -0.178 -0.266 0.170 0.186
## NOFORIGIN 0.210 1.000 -0.883 0.110 0.352 0.427
## GDD -0.178 -0.883 1.000 -0.280 -0.471 -0.529
## PRE_YEAR -0.266 0.110 -0.280 1.000 0.209 0.175
## WOOD 0.170 0.352 -0.471 0.209 1.000 0.434
## martes_martes 0.186 0.427 -0.529 0.175 0.434 1.000
fMM <- glm(martes_martes ~ GDD*PRE_YEAR*WOOD, family=binomial, data=martes)
anova(fMM, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: martes_martes
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 3036 3533
## GDD 1 890 3035 2643 < 2e-16 ***
## PRE_YEAR 1 6 3034 2638 0.018 *
## WOOD 1 158 3033 2479 < 2e-16 ***
## GDD:PRE_YEAR 1 52 3032 2428 6.9e-13 ***
## GDD:WOOD 1 192 3031 2235 < 2e-16 ***
## PRE_YEAR:WOOD 1 5 3030 2231 0.026 *
## GDD:PRE_YEAR:WOOD 1 27 3029 2203 1.7e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fMM)
##
## Call:
## glm(formula = martes_martes ~ GDD * PRE_YEAR * WOOD, family = binomial,
## data = martes)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.265 -0.298 0.277 0.584 2.447
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 8.85e+00 7.10e-01 12.46 < 2e-16 ***
## GDD -2.91e-03 2.65e-04 -10.99 < 2e-16 ***
## PRE_YEAR -1.67e-03 8.31e-04 -2.02 0.0439 *
## WOOD 8.11e-01 1.94e-01 4.18 3.0e-05 ***
## GDD:PRE_YEAR 8.35e-07 3.35e-07 2.49 0.0128 *
## GDD:WOOD -9.99e-05 6.39e-05 -1.56 0.1179
## PRE_YEAR:WOOD 7.41e-04 2.55e-04 2.91 0.0036 **
## GDD:PRE_YEAR:WOOD -4.42e-07 9.79e-08 -4.52 6.3e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 3532.7 on 3036 degrees of freedom
## Residual deviance: 2203.2 on 3029 degrees of freedom
## AIC: 2219
##
## Number of Fisher Scoring iterations: 6
We can now analyse the residuals for spatial autocorrelation:
library(ncf)
resids <- residuals(fMM)
COR <- correlog(martes$EOFORIGIN, martes$NOFORIGIN, resids, increment=50000, resamp=1) # takes a while!
## 1 of 1
plot(COR)
abline(h=0)
What we see here is similar to the ACF-plot for time series: with spatial distance (on the x-axis) the similarity of data points decreases. This plot is called a correlogram. People more used to GIS will know its counterpart, the (semi-)variogram. This shows how the variance between points increases with distance, to level off at some “range”. This range should be the same distance when the correlogram becomes approximately 0.
The variogram has the advantage of being computable in different directions, e.g. towards north and east. To do so, we first have to turn the data into a specific format:
library(sp)
resids.spdf <- SpatialPointsDataFrame(coords=cbind(martes$EOFORIGIN, martes$NOFORIGIN), data=data.frame(resids))
library(gstat)
plot(variogram(resids~1, data=resids.spdf, alpha=c(0,90)))
We notice a slightly stronger spatial autocorrelation towards the east (90°).
To nicely depict the spatial pattern in the residuals, we can use sp’s bubble plot:
(b1 <- bubble(resids.spdf, maxsize=1.5, main="non-spatial GLM"))
In a dataset without residual spatial autocorrelation there would be no pattern, a nice mixture of colours. This is clearly not the case here, as we could already tell from correlogram and variogram.
Now, In my humble opinion, the most common causes of residual spatial autocorrelation is model misspecification, i.e. omitting important variables from the model or specifying the wrong functional form (e.g. no quadratic effect). Let’s see whether this makes a difference here, by putting also in the interactions between predictors.
fMM2 <- glm(martes_martes ~ poly(GDD, 2) + poly(PRE_YEAR, 2) + poly(WOOD, 2) + GDD*PRE_YEAR*WOOD, family=binomial, data=martes)
resids2 <- residuals(fMM2)
cor(resids, resids2) # highly correlated, so probabably little change
## [1] 0.9648
resids2.spdf <- SpatialPointsDataFrame(coords=cbind(martes$EOFORIGIN, martes$NOFORIGIN), data=data.frame(resids2))
plot(variogram(resids~1, data=resids2.spdf, alpha=c(0,90))) # no correlog, takes too long
bubble(resids2.spdf, maxsize=1.5)
This made little difference, we cannot reduce rSAC substantially, although the model is substantially better (AIC of fMM = 2219.2459, AIC of fMM2= 2016.6409). (Now there may be many other predictors missing, which will always be the case.) Instead of further complicating the fixed effects of the model, we shall look at how to incorporate distance-decaying similarity (i.e. residual spatial autocorrelation):
library(MASS)
fake <- as.factor(rep("a", nrow(martes)))
set.seed(2)
some.rows <- sample(nrow(martes), 300)
martes2 <- cbind(martes, fake)[some.rows,]
fglmmPQL <- glmmPQL(martes_martes ~ GDD*PRE_YEAR*WOOD, random=~1|fake, correlation=corGaus(form=~EOFORIGIN+NOFORIGIN), family=binomial, data=martes2, control=list(maxIter=100))
## iteration 1
## iteration 2
## iteration 3
## iteration 4
## iteration 5
## iteration 6
summary(fglmmPQL)
## Linear mixed-effects model fit by maximum likelihood
## Data: martes2
## AIC BIC logLik
## NA NA NA
##
## Random effects:
## Formula: ~1 | fake
## (Intercept) Residual
## StdDev: 9.51e-05 0.8964
##
## Correlation Structure: Gaussian spatial correlation
## Formula: ~EOFORIGIN + NOFORIGIN | fake
## Parameter estimate(s):
## range
## 111702
## Variance function:
## Structure: fixed weights
## Formula: ~invwt
## Fixed effects: martes_martes ~ GDD * PRE_YEAR * WOOD
## Value Std.Error DF t-value p-value
## (Intercept) 9.564 2.4207 292 3.951 0.0001
## GDD -0.003 0.0008 292 -3.483 0.0006
## PRE_YEAR -0.003 0.0027 292 -1.153 0.2500
## WOOD 0.429 0.5615 292 0.764 0.4453
## GDD:PRE_YEAR 0.000 0.0000 292 0.997 0.3196
## GDD:WOOD 0.000 0.0002 292 -0.255 0.7986
## PRE_YEAR:WOOD 0.001 0.0008 292 1.044 0.2973
## GDD:PRE_YEAR:WOOD 0.000 0.0000 292 -1.429 0.1541
## Correlation:
## (Intr) GDD PRE_YEAR WOOD GDD:PRE_YEAR GDD:WO
## GDD -0.934
## PRE_YEAR -0.907 0.878
## WOOD 0.699 -0.617 -0.766
## GDD:PRE_YEAR 0.802 -0.910 -0.919 0.641
## GDD:WOOD -0.671 0.685 0.768 -0.924 -0.751
## PRE_YEAR:WOOD -0.542 0.499 0.738 -0.920 -0.646 0.901
## GDD:PRE_YEAR:WOOD 0.465 -0.505 -0.677 0.788 0.694 -0.919
## PRE_YEAR:
## GDD
## PRE_YEAR
## WOOD
## GDD:PRE_YEAR
## GDD:WOOD
## PRE_YEAR:WOOD
## GDD:PRE_YEAR:WOOD -0.917
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -7.0837 -0.4586 0.2533 0.4784 1.9581
##
## Number of Observations: 300
## Number of Groups: 1
resids2.spdf <- SpatialPointsDataFrame(coords=cbind(martes2$EOFORIGIN, martes2$NOFORIGIN), data=data.frame("resids"=residuals(fglmmPQL)))
plot(variogram(resids~1, data=resids2.spdf, alpha=c(0,90)))
b1 <- bubble(resids.spdf[some.rows,], maxsize=1.5, main="non-spatial GLM")
b2 <- bubble(resids2.spdf, maxsize=1.5, main="spatial GLMM")
library(gridExtra)
## Loading required package: grid
grid.arrange(b1, b2, ncol=2)
Not very untypical, the residuals are only moderately improved compared to the non-spatial GLM. Often the improvement is greater when changing to a more flexible modelling approach (say, Boosted Regression Trees), again pointing towards model misspecification rather than biological processes as being the driver behind (some part of the) residual spatial autocorrelation.
Let’s check whether the more flexible GAM can do better (if we have about half an hour spare):
fgamm <- gamm(martes_martes ~ s(GDD, k=3)+s(PRE_YEAR, k=3)+s(WOOD, k=3), random=list("fake"=~1), correlation=corGaus(form=~EOFORIGIN+NOFORIGIN), family=binomial, data=martes2, niterPQL=70, verbosePQL=T)
##
## Maximum number of PQL iterations: 70
## iteration 1
## iteration 2
## iteration 3
## iteration 4
## iteration 5
## iteration 6
## iteration 7
## iteration 8
summary(fgamm$lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: data
## AIC BIC logLik
## 1217 1254 -598.6
##
## Random effects:
## Formula: ~Xr - 1 | g
## Xr
## StdDev: 16.81
##
## Formula: ~Xr.0 - 1 | g.0 %in% g
## Xr.0
## StdDev: 0.02181
##
## Formula: ~Xr.1 - 1 | g.1 %in% g.0 %in% g
## Xr.1
## StdDev: 4.895
##
## Formula: ~1 | fake %in% g.1 %in% g.0 %in% g
## (Intercept) Residual
## StdDev: 7.621e-06 0.8589
##
## Correlation Structure: Gaussian spatial correlation
## Formula: ~EOFORIGIN + NOFORIGIN | g/g.0/g.1/fake
## Parameter estimate(s):
## range
## 120275
## Variance function:
## Structure: fixed weights
## Formula: ~invwt
## Fixed effects: list(fixed)
## Value Std.Error DF t-value p-value
## X(Intercept) 1.0823 0.1942 296 5.573 0.0000
## Xs(GDD)Fx1 1.1212 0.1926 296 5.821 0.0000
## Xs(PRE_YEAR)Fx1 0.1046 0.1626 296 0.643 0.5208
## Xs(WOOD)Fx1 -0.2770 0.1379 296 -2.008 0.0456
## Correlation:
## X(Int) X(GDD) X(PRE_
## Xs(GDD)Fx1 0.252
## Xs(PRE_YEAR)Fx1 0.131 0.258
## Xs(WOOD)Fx1 -0.053 0.163 -0.244
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.6984 -0.6232 0.4321 0.5779 3.6156
##
## Number of Observations: 300
## Number of Groups:
## g g.0 %in% g
## 1 1
## g.1 %in% g.0 %in% g fake %in% g.1 %in% g.0 %in% g
## 1 1
resids3.spdf <- SpatialPointsDataFrame(coords=cbind(martes2$EOFORIGIN, martes2$NOFORIGIN), data=data.frame("resids"=residuals(fgamm$gam)))
vario1 <- variogram(resids ~ 1, data=resids2.spdf, alpha=0)
vario2 <- variogram(resids ~ 1, data=resids3.spdf, alpha=0)
plot(vario1$dist, vario1$gamma, cex=2, pch=15, type="b", ylab="semi-variance", xlab="distance [m]", ylim=c(0, 20))
points(vario2$dist, vario2$gamma, pch=16, cex=2, type="b")
legend("topleft", pch=15:16, cex=1.5, bty="n", legend=c("GLMM", "GAMM"), lty=1)
b3 <- bubble(resids3.spdf, maxsize=1.5, main="spatial GAMM")
grid.arrange(b1, b2, b3, ncol=3)
The point should be clear: we can use a mixed model approach to compute a spatially parametrised variance-covariance matrix. This is typically very slow and not always entirely satisfactory. This is not the place to go into details about spatial models, it mainly served to illustrate the point of spatial dependence as an indication of mixed effect models.
To read up on other ways to incorporate spatial autocorrelation check out RINLA, spBayes and the following publications (those in parentheses are more of historical interest):